K-Means Method for Grouping in Hybrid MapReduce Cluster
نویسندگان
چکیده
In hybrid cloud computing era, hybrid clusters which are consisted of virtual machines and physical machines become more and more Popular? . MapReduce is a good weapon in this big data era where social computing and multimedia computing are emerging. One of the biggest challenges in hybrid mapreduce cluster is I/O bottleneck which would be aggravated under big data computing. In this paper, we take data locality into consideration and group slave nodes with low intra-communication and high intracommunication. After introducing the architecture and implementation of our grouped hybrid mapreduce cluster (GHMC), we give our k-means algorithm in GHMC and evaluate it with reality environments. The results show that there is a nearly 34.9% performance improvement in our system achieved by the K-means algorithm. Moreover, GHMC system also shows good scalability.
منابع مشابه
A Parallel Clustering Method Study Based on MapReduce
Clustering is considered as the most important task in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Many practical application problems should be solved with clustering method. It has been widely applied into all kinds of areas, such marketing, biology, library, insurance, earth-quake study, and World Wide Web and so on. Many clustering ...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملA hybrid DEA-based K-means and invasive weed optimization for facility location problem
In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...
متن کاملParallel k means clustering based on mapreduce pdf
Parallel K-Means Clustering Based on. Weizhong Zhao1, 2, Huifang Ma1, 2, and Qing He1. The Key Laboratory of Intelligent Information.The K-Means clustering is a basic method in analyzing RS remote sensing images.
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 8 شماره
صفحات -
تاریخ انتشار 2013